88 research outputs found

    Greedy structure learning from data that contains systematic missing values

    Get PDF
    Learning from data that contain missing values represents a common phenomenon in many domains. Relatively few Bayesian Network structure learning algorithms account for missing data, and those that do tend to rely on standard approaches that assume missing data are missing at random, such as the Expectation-Maximisation algorithm. Because missing data are often systematic, there is a need for more pragmatic methods that can effectively deal with data sets containing missing values not missing at random. The absence of approaches that deal with systematic missing data impedes the application of BN structure learning methods to real-world problems where missingness are not random. This paper describes three variants of greedy search structure learning that utilise pairwise deletion and inverse probability weighting to maximally leverage the observed data and to limit potential bias caused by missing values. The first two of the variants can be viewed as sub-versions of the third and best performing variant, but are important in their own in illustrating the successive improvements in learning accuracy. The empirical investigations show that the proposed approach outperforms the commonly used and state-of-the-art Structural EM algorithm, both in terms of learning accuracy and efficiency, as well as both when data are missing at random and not at random

    Privatising Prisons: An endeavour for efficiency or a misinterpretation of priorities?

    Get PDF
    This essay looks at the meaning of New Public Management, its history in both economic theory and political ascension and challenges its efficacy in relation to the prison service. New Public Management’s mix of new institutional economics and managerialism brought forward a wealth of claims that this new managerial practice would supersede and improve the previous traditional public administration that had been in place. The public sector as it was in the 1980s was weighing very heavy on the state and a refreshed and assertive Conservative government allowed for a stronger push to fulfil its overall objectives. This body of work uses case study methodology from the perspective of two instrumental cases to assess what has happened to New Public Management as it has been implemented into the private sector without much due care at all. The focus has been on excessively expanding competition by contracting out and market testing along with the aggressive ascension of many PFI prisons in the UK. The data collated points to the fact that this new alternative style of management that was allegedly superior to that of the traditional approach, has failed to consider the intangible underlying concepts and values that hold up the prison system and our criminal justice system as a whole. This essay looks to explain why business and punishment do not coalesce

    Bayesian networks for prediction, risk assessment and decision making in an inefficient Association Football gambling market.

    Get PDF
    PhDResearchers have witnessed the great success in deterministic and perfect information domains. Intelligent pruning and evaluation techniques have been proven to be sufficient in providing outstanding intelligent decision making performance. However, processes that model uncertainty and risk for real-life situations have not met the same success. Association Football has been identified as an ideal and exciting application for that matter; it is the world's most popular sport and constitutes the fastest growing gambling market at international level. As a result, summarising the risk and uncertainty when it comes to the outcomes of relevant football match events has been dramatically increased both in importance as well as in challenge. A gambling market is described as being inefficient if there are one or more betting procedures that generate profit, at a consistent rate, as a consequence of exploiting market flaws. This study exhibits evidence of an (intended) inefficient football gambling market and demonstrates how a Bayesian network model can be employed against market odds for the gambler’s benefit. A Bayesian network is a graphical probabilistic model that represents the conditional dependencies among uncertain variables which can be both objective and subjective. We have proposed such a model, which we call pi-football, and used it to generate forecasts for the English Premier League matches during seasons 2010/11 and 2011/12. The proposed subjective variables represent the factors that are important for prediction but which historical data fails to capture, and forecasts were published online at www.pi-football.com prior to the start of each match.For assessing the performance of our model we have considered both profitability and accuracy measures and demonstrate that subjective information improved the forecasting capability of our model significantly. Resulting match forecasts are sufficiently more accurate relative to market odds and thus, the model demonstrates profitable returns at a consistent rateEngineering and Physical Sciences Research Council (EPSRC; Agena Ltd for software support

    Information fusion between knowledge and data in Bayesian network structure learning

    Full text link
    Bayesian Networks (BNs) have become a powerful technology for reasoning under uncertainty, particularly in areas that require causal assumptions that enable us to simulate the effect of intervention. The graphical structure of these models can be determined by causal knowledge, learnt from data, or a combination of both. While it seems plausible that the best approach in constructing a causal graph involves combining knowledge with machine learning, this approach remains underused in practice. We implement and evaluate 10 knowledge approaches with application to different case studies and BN structure learning algorithms available in the open-source Bayesys structure learning system. The approaches enable us to specify pre-existing knowledge that can be obtained from heterogeneous sources, to constrain or guide structure learning. Each approach is assessed in terms of structure learning effectiveness and efficiency, including graphical accuracy, model fitting, complexity, and runtime; making this the first paper that provides a comparative evaluation of a wide range of knowledge approaches for BN structure learning. Because the value of knowledge depends on what data are available, we illustrate the results both with limited and big data. While the overall results show that knowledge becomes less important with big data due to higher learning accuracy rendering knowledge less important, some of the knowledge approaches are actually found to be more important with big data. Amongst the main conclusions is the observation that reduced search space obtained from knowledge does not always imply reduced computational complexity, perhaps because the relationships implied by the data and knowledge are in tension

    Effective and efficient structure learning with pruning and model averaging strategies

    Full text link
    Learning the structure of a Bayesian Network (BN) with score-based solutions involves exploring the search space of possible graphs and moving towards the graph that maximises a given objective function. Some algorithms offer exact solutions that guarantee to return the graph with the highest objective score, while others offer approximate solutions in exchange for reduced computational complexity. This paper describes an approximate BN structure learning algorithm, which we call Model Averaging Hill-Climbing (MAHC), that combines two novel strategies with hill-climbing search. The algorithm starts by pruning the search space of graphs, where the pruning strategy can be viewed as an aggressive version of the pruning strategies that are typically applied to combinatorial optimisation structure learning problems. It then performs model averaging in the hill-climbing search process and moves to the neighbouring graph that maximises the objective function, on average, for that neighbouring graph and over all its valid neighbouring graphs. Comparisons with other algorithms spanning different classes of learning suggest that the combination of aggressive pruning with model averaging is both effective and efficient, particularly in the presence of data noise

    Profiting from an inefficient association football gambling market: Prediction, risk and uncertainty using Bayesian networks

    Get PDF
    AbstractWe present a Bayesian network (BN) model for forecasting Association Football match outcomes. Both objective and subjective information are considered for prediction, and we demonstrate how probabilities transform at each level of model component, whereby predictive distributions follow hierarchical levels of Bayesian inference. The model was used to generate forecasts for each match of the 2011/2012 English Premier League (EPL) season, and forecasts were published online prior to the start of each match. Profitability, risk and uncertainty are evaluated by considering various unit-based betting procedures against published market odds. Compared to a previously published successful BN model, the model presented in this paper is less complex and is able to generate even more profitable returns

    Open problems in causal structure learning: A case study of COVID-19 in the UK

    Full text link
    Causal machine learning (ML) algorithms recover graphical structures that tell us something about cause-and-effect relationships. The causal representation praovided by these algorithms enables transparency and explainability, which is necessary for decision making in critical real-world problems. Yet, causal ML has had limited impact in practice compared to associational ML. This paper investigates the challenges of causal ML with application to COVID-19 UK pandemic data. We collate data from various public sources and investigate what the various structure learning algorithms learn from these data. We explore the impact of different data formats on algorithms spanning different classes of learning, and assess the results produced by each algorithm, and groups of algorithms, in terms of graphical structure, model dimensionality, sensitivity analysis, confounding variables, predictive and interventional inference. We use these results to highlight open problems in causal structure learning and directions for future research. To facilitate future work, we make all graphs, models, data sets, and source code publicly available online

    From complex questionnaire and interviewing data to intelligent Bayesian network models for medical decision support

    Get PDF
    OBJECTIVES: 1) To develop a rigorous and repeatable method for building effective Bayesian network (BN) models for medical decision support from complex, unstructured and incomplete patient questionnaires and interviews that inevitably contain examples of repetitive, redundant and contradictory responses; 2) To exploit expert knowledge in the BN development since further data acquisition is usually not possible; 3) To ensure the BN model can be used for interventional analysis; 4) To demonstrate why using data alone to learn the model structure and parameters is often unsatisfactory even when extensive data is available. METHOD: The method is based on applying a range of recent BN developments targeted at helping experts build BNs given limited data. While most of the components of the method are based on established work, its novelty is that it provides a rigorous consolidated and generalised framework that addresses the whole life-cycle of BN model development. The method is based on two original and recent validated BN models in forensic psychiatry, known as DSVM-MSS and DSVM-P. RESULTS: When employed with the same datasets, the DSVM-MSS demonstrated competitive to superior predictive performance (AUC scores 0.708 and 0.797) against the state-of-the-art (AUC scores ranging from 0.527 to 0.705), and the DSVM-P demonstrated superior predictive performance (cross-validated AUC score of 0.78) against the state-of-the-art (AUC scores ranging from 0.665 to 0.717). More importantly, the resulting models go beyond improving predictive accuracy and into usefulness for risk management purposes through intervention, and enhanced decision support in terms of answering complex clinical questions that are based on unobserved evidence. CONCLUSIONS: This development process is applicable to any application domain which involves large-scale decision analysis based on such complex information, rather than based on data with hard facts, and in conjunction with the incorporation of expert knowledge for decision support via intervention. The novelty extends to challenging the decision scientists to reason about building models based on what information is really required for inference, rather than based on what data is available and hence, forces decision scientists to use available data in a much smarter way
    • …
    corecore